139 research outputs found
Mobile Data Management
The management of data in the mobile computing environment offers new challenging problems. Existing software needs to be upgraded to accommodate this environment. To do so, the critical parameters need to be understood and defined. We have surveyed some problems and existing solution
Controlling Web Query Execution in a Web Warehouse
Most of the contemporary Web query systems have limited capabilities in controlling Web query execution. Such query facility is important as it gives us an opportunity to optimize the evaluation of a Web query. We address this issue in the context of our Web warehousing system called WHOWEDA (Warehouse Of Web Data). Specifically, we investigate different types of constraints (related to query execution) which may be imposed on a Web query such as number of query results, time of execution, restrict the evaluation of a query to specified set of Web sites, etc. An important feature of our approach is that it attempts to address the query evaluation issues which may arise due to the existence of broken links and forms in the Web
ARENA: Towards Informative Alternative Query Plan Selection for Database Education
A key learning goal of learners taking database systems course is to
understand how SQL queries are processed in an RDBMS in practice. To this end,
comprehension of the cost-based comparison of different plan choices to select
the query execution plan (QEP) of a query is paramount. Unfortunately,
off-the-shelf RDBMS typically only expose the selected QEP to users without
revealing information about representative alternative query plans considered
during QEP selection in a learner-friendly manner, hindering the learning
process. In this paper, we present a novel end-to-end and generic framework
called ARENA that facilitates exploration of informative alternative query
plans of a given SQL query to aid the comprehension of QEP selection. Under the
hood, ARENA addresses a novel problem called alternative plan selection problem
(TIPS) which aims to discover a set of k alternative plans from the underlying
plan space so that the plan interestingness of the set is maximized.
Specifically, we explore two variants of the problem, namely batch TIPS and
incremental TIPS, to cater to diverse set of learners. Due to the computational
hardness of the problem, we present a 2 approximation algorithm to address it
efficiently. Exhaustive experimental study with real-world learners
demonstrates the effectiveness of arena in enhancing learners' understanding of
the alternative plan choices considered during QEP selection.Comment: Add a link to access our ARENA system on the third pag
Association Rules for Web Data Mining in WHOWEDA
The authors discuss association rules which can be discovered from Web data. The association rules are discussed within the scope of our WHOWEDA (warehouse of Web data) project. WHOWEDA is supported by a Web data model and a set of algebraic operators. The Web data model allows a uniform and integrated view of Web data gathered using a user\u27\u27s query graph. A user\u27\u27s query graph describes the query by example (what the user perceives as the query) and the Web coupling query gathers instances of such a query graph from the Web and stores them in the form of subgraphs (called Web tuples) in a Web table. We discuss association rules within this domain. An association rule defines an association between the nodes and links attributes of Web tuples within a Web table. There are two different classes of association rules that can be developed from data in a Web table. There are two different classes of association rules that can be developed from data in a Web table. Node-to-node associations are those rules that relate the content (defined by metadata attributes) between two or more nodes within a Web tuple. Link associations are rules that show the connectivity of different URLs. Distinguishing the two types of associations provides a view of the structure of the Web data. The goal of performing Web association mining on Web data is to better organize searching patterns through hyperlinked document
DKWS: A Distributed System for Keyword Search on Massive Graphs (Complete Version)
Due to the unstructuredness and the lack of schemas of graphs, such as
knowledge graphs, social networks, and RDF graphs, keyword search for querying
such graphs has been proposed. As graphs have become voluminous, large-scale
distributed processing has attracted much interest from the database research
community. While there have been several distributed systems, distributed
querying techniques for keyword search are still limited. This paper proposes a
novel distributed keyword search system called \DKWS. First, we
\revise{present} a {\em monotonic} property with keyword search algorithms that
guarantees correct parallelization. Second, we present a keyword search
algorithm as monotonic backward and forward search phases. Moreover, we propose
new tight bounds for pruning nodes being searched. Third, we propose a {\em
notify-push} paradigm and \PINE {\em programming model} of \DKWS. The
notify-push paradigm allows {\em asynchronously} exchanging the upper bounds of
matches across the workers and the coordinator in \DKWS. The \PINE
programming model naturally fits keyword search algorithms, as they have
distinguished phases, to allow {\em preemptive} searches to mitigate staleness
in a distributed system. Finally, we investigate the performance and
effectiveness of \DKWS through experiments using real-world datasets. We find
that \DKWS is up to two orders of magnitude faster than related techniques,
and its communication costs are times smaller than those of other
techniques
Influence Maximization in Social Networks: A Survey
Online social networks have become an important platform for people to
communicate, share knowledge and disseminate information. Given the widespread
usage of social media, individuals' ideas, preferences and behavior are often
influenced by their peers or friends in the social networks that they
participate in. Since the last decade, influence maximization (IM) problem has
been extensively adopted to model the diffusion of innovations and ideas. The
purpose of IM is to select a set of k seed nodes who can influence the most
individuals in the network.
In this survey, we present a systematical study over the researches and
future directions with respect to IM problem. We review the information
diffusion models and analyze a variety of algorithms for the classic IM
algorithms. We propose a taxonomy for potential readers to understand the key
techniques and challenges. We also organize the milestone works in time order
such that the readers of this survey can experience the research roadmap in
this field. Moreover, we also categorize other application-oriented IM studies
and correspondingly study each of them. What's more, we list a series of open
questions as the future directions for IM-related researches, where a potential
reader of this survey can easily observe what should be done next in this
field
Cost-benefit Analysis of Web Bag in a Web Warehouse
Sets and bags are closely related structures and have been studied in relational databases. A bag is different from a set in that it is sensitive to the number of times an element occurs, while a set is not. In this paper, we introduce the concept of a Web bag in the context of a World Wide Web warehouse called WHOWEDA (WareHouse Of WEb DAta) which we are currently building. Informally, a Web bag is a Web table which allows multiple occurrences of identical Web types. A Web bag helps one to discover useful knowledge from a Web table, such as visible documents or Web sites (i.e. documents/sites which can be reached by many paths), luminous documents (i.e. documents with many outgoing links) and luminous paths (i.e. frequently traversed paths). In this paper, we provide a cost-benefit analysis of materializing Web bags as compared to Web tables with distinct Web tuple
Reducing Cognitive Overheads in a Web Warehouse using Reverse-Osmosis
This paper provides a quantitative analysis of reducing cognitive overheads in a Web warehouse using an important class of operation called reverse osmosis. The analysis is used to examine two different cognitive overheads of locating relevant nodes or information and display time of a Web table. A reverse-osmosis operation enables us to eliminate in relevant information from a collection of Web documents stored in the form of a Web table. We call such an operation reverse-osmosis because it is analogous to the reverse osmosis process in the field of water purification. We discuss a formal algorithm of the reverse-osmosis operatio
- …